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Abstract. A MATLAB implementation of the More-Sorensen sequential (MSS) method 
is presented. The MSS method computes the minimizer of a quadratic function defined by 
a limited-memory BEGS matrix subject to a two-norm trust-region constraint. This solver 
is an adaptation of the More-Sorensen direct method into an L-BEGS setting for large-scale 
optimization. The MSS method makes use of a recently proposed stable fast direct method 
for solving large shifted BEGS systems of equations [9l |8] . This MATLAB implementation 
is a matrix-free iterative method for large-scale optimization. Numerical experiments show 
that the MSS method is able to compute solutions to high accuracy. 



1. Introduction 

In this paper we describe a MATLAB implementation for minimizing a quadratic function 
defined by a limited-memory BFGS (L-BFGS) matrix subject to a two-norm constraint, i.e., 
for a given Xk, 

minimize Q{p) = g'^p-\ — p^ Bp subject to ||p||2 < 5, (1) 

where g = Vf{xk), B is an L-BFGS approximation to V^f{xk), and 5 is a given positive 
constant. Approximately solving ([1]) is of interest to the optimization community, as it is 
the computational bottleneck of trust-region methods for large-scale optimization. Generally 
speaking, there is a trade-off in computational cost per subproblem and the number of 
overall trust-region iterations (i.e., function and gradient evaluations): The more accurate 
the subproblem solver, the fewer overall iterations required. Solvers that reduce the overall 
number of function and gradient evaluations are of particular interest when function (or 
gradient) evaluations are time-consuming, e.g., simulation-based optimization. This paper 
presents an algorithm to solve ([1]) to any user-defined accuracy. 

The earliest quasi- Newton methods to solve ([1]) in a trust-region context were designed for 
symmetric positive-definite Hessian approximations. These methods used the trust region 
only to restrict the length of the step (see, e.g., [TU [T71 [TJ [H]) and did not seek to solve 
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([I]) to high accuracy. In [THl [17] , Powell proposes a quasi-Newton trust-region method that 
accepts the Cauchy point Pc, the solution of the minimization problem 

minimize {Q{p) '■ p = —og} , 

as the approximate solution of ([1]) whenever Updh > ^- If the Cauchy point satisfies 
Ibclh < Powell computes the quasi-Newton step pg^ = —Bg, and if the quasi-Newton 
satisfies UpgTvlh < 5, this step is taken as the approximate solution of ([T]). If this second 
test is not satisfied, the approximate solution of the minimization problem is chosen to be a 
convex combination of the Cauchy step and the quasi-Newton step that lies on the bound- 
s-ry ||p||2 = ^ • This method is often referred to as a "dogleg" strategy. In [7], Dennis and 
Mei propose two modifications: They exchange the order of Powell's tests of the Cauchy 
step and the quasi-Newton step, and they modify the dogleg strategy into a double-dogleg 
strategy that biases the approximate minimizer more in the favor of the quasi-Newton direc- 
tion. In [11] , Kauffman extends the double-dogleg strategy to a limited- memory framework. 
These methods require products with both B and B~^, which are accomplished by updating 
a matrix factorization at each step. However, these methods only approximately solve ([T]); 
when the solution lies on the boundary, none of these methods seek to solve the minimization 
problem to high accuracy. 

Methods to solve ([T]) to high accuracy are often based on optimality conditions given in 
the following theorem (see, e.g.. Gay [10], Sorensen [19j, More and Sorensen [13] or Conn, 
Gould and Toint [5]): 

Theorem 1. Let 6 be a positive constant. A vector p* is a global solution of the trust-region 
subproblem (QP if and only if \\p*\\2 < S and there exists a unique a* > such that B + a* I 
is positive semidefinite and 

{B + a*I)p* = -g and a*{6 - \\p*\\2) = 0. (2) 

Moreover, if B + a* I is positive definite, then the global minimizer is unique. 

The More-Sorensen algorithm [T3] seeks {p*,a*) that satisfy the optimality conditions ([2]) 
by trading off between updating p and a. That is, each iteration, the method updates p 
(fixing a) by solving the linear system {B + al)p = —g using the Cholesky factorization of 
the B -\- al] then, a is updated using a safeguarded Newton method to find a root of 

\\p{^)h ^ 

The More-Sorensen direct method is arguably the best direct method for solving the trust- 
region subproblem; in fact, the accuracy of each solve can be specified by the user. While 
this method is practical for smaller-sized problems, in large-scale optimization it is too 
computationally expensive to compute and store Cholesky factorizations for unstructured 
Hessians. 

In Burke et al. [3], the authors propose an adaption of More-Sorensen method to the 
limited-memory BFGS setting. They begin by computing the quasi-Newton step Pqisi = —Bg. 
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If PqN satisfies the two-norm inequality constraint, the problem is solved; otherwise, Burke 
et al. find a solution ||p||2 that lies on the boundary of the trust region using More-Sorensen's 
method by computing a pair a) such that 

{B + al)p{a) = -g and a{5-Ma)h)=Q. (4) 

similar to the original direct method. The solution of the second equation in is computed 
using two Choleksy factorizations of M x M matrices, where M is the number of limited- 
memory updates. The method is derived using the Sherman-Morrison- Woodbury formula. 
While this technique is able to exploit properties of L-BFGS updates, there are potential 
instability issues related to their proposed use of the Sherman-Morrison- Woodbury that are 
not addressed. 

Lu and Monteiro [TS] also explore a More-Sorensen method implementation when B has 
special structure; namely, B = D + VEV'^, where D and E are positive diagonal matri- 
ces, and V has a small number of columns. Their approach uses the Sherman-Morrison- 
Woodbury formula to replace solves with {B + al) with solves with an M x M system 
composed of a diagonal plus a low rank matrix. Thus, this method is able to avoid comput- 
ing Choleksy factorizations. Like with there are potential stability issues that are not 
addressed regarding inverting the M x M matrix. 

Finally, Apostolopoulou et al. [21 E] derive a closed- form expression for {B + crl)^^ to 
solve the first equation in (jl]). The authors are able to explicitly compute the eigenvalues of 
B, provided M=l[2|,[T]orM = 2[T]. While their formula avoids potential instabilities 
associated the Sherman-Morrison- Woodbury formula, their formula is restricted to the case 
when the number of updates is at most two. 

1.1. Overview of the proposed methods. In this paper, we describe a new adaptation of 
the More-Sorensen solver into a large-scale L-BFGS setting. The proposed method, called the 
More-Sorensen sequential (MSS) method, is able to exploit the structure of BFGS matrices to 
solve the shifted L-BFGS system in ([2]) using a fast direct recursion method that the authors 
originally proposed in [9J. (This recursion was later proven to be stable in ^.) The MSS 
method is able to solve ([1]) to any prescribed accuracy. 

The paper is organized in five sections. In Section [2] we review L-BFGS quasi- Newton 
methods and introduce notation that will be used for the duration of this paper. Section H] 
includes numerical results comparing the More-Sorensen method and the MSS method. Fi- 
nally, Section O includes some concluding remarks and observations. 

1.2. Notation and Glossary. Unless explicitly indicated, || ■ || denotes the vector two- 
norm or its subordinate matrix norm. In this paper, all methods use L-BFGS updates and 
we assume they are selected to ensure the quasi- Newton matrices remain sufficiently positive 
definite. 

2. Background 



In this section, we begin with an overview of the L-BFGS quasi-Newton matrices described 
by Nocedal [15], defining notation that will be used throughout the paper. 



4 



JENNIFER B. ERWAY AND ROUMMEL F. MARCIA 



The L-BFGS quasi-Newton method generates a sequence of positive-definite matrices {Bj} 
from a sequence of vectors {i/j} and {sj} defined as 

Vj = - f{xj) and sj = x^+i - Xj, 

where j = 0, ... m — 1 where m < M, and M is the maximum number of allowed stored 
pairs [r/j, Sj). This method can be viewed as the BFGS quasi-Newton method where no more 
than the M most recently computed updates are stored and used to update an initial matrix 
Bq. The L-BFGS quasi-Newton approximation to the Hessian of / is implicitly updated as 
follows: 



m— 1 771—1 



7;=o i=o 

where 

= / Tr. ' = ^0 = ^rn ^, (6) 

^/siBiSi VVtSi 

and 7m, > is a constant. In practice, 7^ is often defined to be 7^ = s^-iym-i/llz/m-ilP 
(see, e.g., |12] or |I5]). In order to maintain that the sequence {Bi} is positive definite for 
i = 1, . . .m, each of the accepted pairs must satisfy yfsi > for i = 0, . . . , m — 1. 

Suppose that we have computed m updates (m < M) and have the following updates 
stored in S and Y: 

S =[so... Sm-i] and Y = [yo . . . ym_i] . 

The matrices S and Y are updated with the most recently computed vector pair (sm, Vm) as 
follows: 



Algorithm 2.1: Update S and Y. 

if m < M, 

S ^[S Sm]] Y ^ [Y Um]] m + 
else 

for i = 0, . . . m — 1 

Si ^ Si+i] Ui ^ Hi+i, 

end 

S ^ [so, . . .Sm_i]; Y i- [yo, . . .ym-i]; 
end 



Thus, at all times we have exactly m stored vectors with m < M. 

One of the advantages of using an L-BFGS quasi-Newton is that there is an efficient recur- 
sion relation to compute products with -B~^. Given a vector z, the following algorithm p^lTG] 
terminates with r = Bz}z: 
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Algorithm 2.2: Two- loop recursion to compute r = B^z. 



for /c = m — 1, . . . , 

Pk ^ 

q ^ q - akUk, 



end 



r 5o ^g; 

for A; = 0, . . . , m — 1 



/3 ^ PfcZ/fc'^; 



end 



The two-term recursion formula requires at most 0{Mn) multiplications and additions. 
To compute products with the L-BFGS quasi-Newton matrix, one may use the so-called 
"unrolling" formula, which requires 0{M'^n) multiplications, or one may use a compact 
matrix representation of the L-BFGS that can be used to compute products with the L- 
BFGS quasi-Newton matrix, which requires 0{Mn) multiplications (see, e.g., [H]). Further 
details on L-BFGS updates can found in [TSj ; further background on the BFGS updates can 
be found in [6J. 

Without loss of generality, for the duration of the paper we assume that i? is a symmetric 
positive-definite quasi-Newton matrix formed using m [m < M) L-BFGS updates. 

3. The MSS method 

In this section, we present the MSS method to solve the constrained optimization problem 
([1]). We begin by considering the More-Sorensen direct method proposed in f]3]. 

The More-Sorensen direct method seeks a pair (p, a) that satisfy the optimality conditions 
([2]) by alternating between updating p and a. In the case that the B is positive definite (as 
in L-BFGS matrices), the method simplifies to Algorithm 13.11 |14j. 




Algorithm 3.1: More-Sorensen Method. 

a^O; -B-^g; 
if IIpII < 6 
return; 
else 

while not converged do 
Factor B + al = R^R; 
Solve RT Rp = —g; 
Solve R^q = p; 




a ^ a + 



IpIP IIpI1~<^ . 
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end do 
end 



The update to a in Algorithm 13.11 can be shown to be Newton's method apphed ([3]). 
(Since B is positive definite, a safeguarded Newton's method unnecessary.) Convergence 
is predicated on solving the optimality conditions ([2]) to a prescribed accuracy. The only 
difficulty in implementing the More-Sorensen method in a large-scale setting is the shifted 
solve {B + cr/)j9 = —g. 

One method to directly solve systems of the form {B + al)x = ?/ is to view the system 
matrix as the sum of a I and rank-one L-BFGS updates to an initial diagonal matrix Bq. It 
is important distinguish between applying rank-one L-BFGS updates to Bq + al and viewing 
the system matrix as the sum of rank-one updates to Bq + al. To compute {B + (Tl)^^y, 
one cannot simply substitute Bq + al in for Bq in Algorithm (12. 2p . (For a discussion on this, 
see PI). In |9j, Erway and Marcia present a stable fast direct method for solving L-BFGS 
systems that are shifted by a constant diagonal matrix (stability is shown in [8]). Specifically, 
it is shown that products with {B + al)^^ can be computed provided 70" is bounded away 
from zero, where Bq = 7"^/. The following theorem is found in [H]: 

Theorem 2. Let 7 > and cr > 0. Suppose G = (7^"*^ + cr)I, and let H = Yl'i=o^^ where 

with [ai] and {hi} are defined as in Further, define Cm+i = G + Eq + ■ — h E^- If there 
exists some e > such that 70" > e, then B + al = G + H , and {B + al)^^ is given by 

{B + al) ^ = C2m-1 " V2m-lC2m-lE2m-lC2m-l 

where 

Gr^\ = Gr' - v,Gr'E,Gr\ ^ = 0, . . . , 2m - 1, 

and 

1 

~ 1 + trace {Gr^Ei) ' 

Proof. See [9]. □ 

This theorem is the basis for the following algorithm derived in [9] to compute products 
with {B + al)~^. The algorithm was shown to be stable in [8]. 

Algorithm 3.2: Recursion to compute x = {B + al)^^y. 

X ^ (7^1 + ay^y; 
for k = 0, . . . , 2m — 1 
if k even 

c ^ afc/2; 

else 

c ^ fe(fc-i)/2; 
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end 

rk <- (7~^ + cr)~^c; 
for i = 0, . . . , k — 1 

rk <- rk + {~iyvi{rfc)ri; 

end 

t;,^l/(l + (-l)'=+VJc); 
X ^ x + {-lYvk{rly)rk] 
end 



It is important to note that {oj} and {hi} need to be precomputed. Algorithm 13.21 requires 
at most C(M^n). Operations with Co = -Bo + o"-^ and Ci can be easily computed with 
minimal extra expense since Cq^ is a diagonal matrix. It is generally known that M may 
be kept small (for example, Byrd et al. |1] suggest M G [3,7]). When <^ n, the extra 
storage requirements and computations are affordable. 

3.1. Handling small a. In this section we discuss the case of solving {B + al)p = —g for 
small a. For stability, it is important to maintain 70" > e^. for a small positive €„. Thus, 
in order to use Algorithm 13. 2^ we require that both 7 > and a > ^/e^. The first 
requirement is easily met by thresholding 7, i.e., 7 i— minj^e^, 7}. (Recall that yfsi > 
for each i = 0, ... m — 1, and thus, 7 > 0.) When cr < ^/e^, we set a = and use the 
two-loop recursion (Algorithm 12. 2p to solve an unshifted L-BFGS system. 

3.2. The algorithm. The MSS method adapts the More-Sorensen method into an L-BFGS 
setting by solving {B + crl)p = —g using a recursion method instead of a Cholesky factoriza- 
tion. When a is sufficiently large, the recently proposed recursion algorithm (Algorithm 13. 2p 
is used to update s; otherwise, a ~ and the two-loop recursion (Algorithm 12. 2p is used. 
Also, note that the More-Sorensen method updates a using Newton's method applied to ([3]), 
i.e., 

a^a-<p{p)/<p'{p). (7) 

In Algorithm 13.11 this update is written in terms of the Cholesky factors. In the MSS algo- 
rithm, Cholesky factors are unavailable, and thus, the update to a is performed by explicitly 
computing the Newton step ([7]). For details on this update see [5J. 
The MSS method is summarized in Algorithm 13.31 



Algorithm 3.3: MSS Method. 

Specify 7 > ^/e^ where < e^- ^ 1; 
or^O; p^ -B-^g; 
if IIpII < 5 
return; 
else 

while not converged do 
0(p) ^ 1/lbll - 1/6; 
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if cr > 

Compute f) such that {B + a)p = —p using Algorithm I3.2| 

else 

Compute p such that Bp = —p using Algorithm 12. 2[ 

end 

^ -(P^p)/lbf ; 

a ^ a -(j){p)/(f)'{py, 
if (7 > ^ 

Compute p such that {B + a)p = —g using Algorithm I3.2( 

else 

Compute p such that Bp = —g using Algorithm I2.2t 

end 

end 
end 



3.3. Stopping criteria. When the solution to ([T]) lies on the constraint boundary \\p\\ = S, 
it is reasonable to accept p when is sufficiently close to 6. Thus, for a fixed r <^ 1, the 
MSS method tests 

\\\p\\ - 6\ < t6, (8) 

for convergence on the constraint boundary. This test may also be used in addition to the 
test IIpII < 6 for the quasi-Newton step PqN = —B^^g, making the condition 

\\p\\ < 6 or IIIpII — 6\ < tS. (9) 

4. Numerical Results 

In this section, we test the accuracy of the proposed More-Sorensen sequential (MSS) 
method. Numerical results were obtained using MATLAB implementations of the MSS and 
More-Sorensen methods. For all experiments, vector pairs {{si,yi)}, i = 0, . . .m — 1, (m < 
M), were randomly generated by computing two m x 1 vectors 5* and Y such that whenever 

sfiji < 0, the sign of Si was fiipped, i.e., Si < Sj. Thus, the sequence of L-BFGS matrices 

were positive definite. For each experiment, g and 6 were also randomly generated. The 
number of limited-memory updates m was set to 5. Finally, the lower bound on 7cr was 
taken to be = e, where e is machine precision. 

For the numerical experiments, the stopping criteria (|8]) and iQ were used with 

r ^ min{||^|| * 10"^ v^}, 

where e is machine precision. For each problem, no more than 500 iterations were allowed 
for either MSS method and More-Sorensen method. 

In the first set of experiments, random problems of size n=100, 500, 1000, 2000, and 5000 
were generated. The results of both solvers are presented in Table 1. For each problem. 
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Table 1. Comparison of results from the More-Sorensen (MS) method and 
the More-Sorensen Sequential (MSS) method on small problems. 



n 


5 


7 




9 




T 


MS error 


MSS error 


100 
500 
1000 
2500 
5000 


6.83e-01 
6.98e-02 
4.12e-01 
3.86e-01 
1.93e-01 


1.04e-01 
3.77e-02 
2.82e-02 
2.91e-02 
5.41e-03 


9.79e+00 
2.28e+01 
3.23e+01 
4.99e+01 
7. 18e+01 


1.49e-08 
1.49e-08 
1.49e-08 
1.49e-08 
1.49e-08 


6.17e-14 
3.57e-14 
4.54e-07 
4.37e-07 
1.04e-07 


2.21e-14 
1.69e-14 
1.62e-07 
1.85e-07 
3.49e-08 



Table 2. More-Sorensen Sequential (MSS) method on large problems. 



n 


6 


7 




9 




T 


MSS error 


10000 
50000 
100000 
500000 
1000000 


9.03e-01 
9.37e-01 
5.34e-01 
5.00e-01 
6.35e-01 


1.96e-02 
2.37e-03 
3.74e-03 
2.01e-03 
7.08e-04 


l.OOe+02 
2.22e+02 
3.15e+02 
7.06e+02 
l.OOe+03 


1.49e-08 
1.49e-08 
1.49e-08 
1.49e-08 
1.49e-08 


1.30e-09 
1.83e-ll 
1.24e-07 
2.57e-12 
1.39e-12 



(5,7, \\g\\, and r are given. The reported errors for each problem is the sum of the errors in 
the optimality conditions, i.e., 

error = \\{B + a*I)p + g\\ + \a*{5 - \\p*\\)\. (10) 

In addition to storage limitations in memory and because of the time requirements needed 
to compute Cholesky factorizations, the More-Sorensen method was not tested on problems 
with n > 5000. Both methods were able to solve all constrained optimization problems to 
the required tolerance. (Note that convergence depends on r, but the sum of the errors in 
the optimality conditions do not have to be less than r.) 

Larger values of n were chosen for the second set of experiments. We tested the MSS 
method on five random problems of large size and computed the error as in (fTOj) . Table 
2 reports these results. In all cases, the MSS method was able to solve the optimization 
problem to the required tolerance. 

5. Concluding Remarks 

In this paper, we have presented a MATLAB implementation of the MSS algorithm that 
is able to solve problems of the form ([T]). This solver is stable and numerical results confirm 
that the method can compute solutions to any prescribed accuracy. Future research includes 
implementing the MSS method in a trust-region algorithm for large-scale optimization. 
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